会话言论通常在话语水平上以松散的句法结构体现,但同时表现出连续话语的局部相干关系。事先工作已经表明,使用经常性神经网络或长短期存储器语言模型(LM)捕获较长的上下文信息可能遭受最近的偏置,而不是在远程上下文中。为了捕获词语和跨越话语之间的长期语义互动,我们提出了对话语音的自动语音识别(ASR)中语言建模的不同谈话历史融合方法。此外,引入了一种新的函数融合机制,该机制被引入熔断器并利用当前话语的声学嵌入和其相应的对话历史的语义含量以协作方式。为了塑造我们的想法,我们将ASR N-Best假设救援人员框架作为预测问题,利用BERT,一个标志性的预训练LM,作为成分车辆,以便于从给定的N最佳假设列表中选择Oracle假设。在AMI基准数据集上进行的实证实验似乎展示了我们对某些目前的线上的方法相关的可行性和功效。
translated by 谷歌翻译
端到端(E2E)神经建模已成为开发计算机辅助语言培训(CAPT)系统的一个主要思想,对基于传统发音评分的方法表示竞争性能。然而,所需的当前E2E神经方法面临至少两个关键挑战。一方面,大多数E2E方法以自回归方式使用左右波束搜索操作,以指示L2学习者的发音。然而,这导致推理速度非常慢,这不可避免地阻碍了他们的实际用途。另一方面,E2E神经方法通常是数据贪婪,同时,非训练数据量不足通常会降低误用检测和诊断(MD&D)的疗效。作为回应,我们提出了一种新的MD&D方法,利用非归共(NAR)E2E神经建模,以大大加速推理时间,同时通过传统的E2E神经方法保持性能。此外,我们设计并开发了堆叠在我们的方法的NAR E2E模型之上的发音建模网络,以进一步提高MD&D的有效性。与某些直接的E2E模型和基于DNN-HMM声学模型构建的基于ICONIC发音评分的方法相比,在L2-arctic英语数据集上进行的经验实验似乎验证了我们方法的可行性。
translated by 谷歌翻译
It is well established in neuroscience that color vision plays an essential part in the human visual perception system. Meanwhile, many novel designs for computer vision inspired by human vision have achieved success in a wide range of tasks and applications. Nonetheless, how color differences affect machine vision has not been well explored. Our work tries to bridge this gap between the human color vision aspect of visual recognition and that of the machine. To achieve this, we curate two datasets: CIFAR10-F and CIFAR100-F, which are based on the foreground colors of the popular CIFAR datasets. Together with CIFAR10-B and CIFAR100-B, the existing counterpart datasets with information on the background colors of CIFAR test sets, we assign each image based on its color contrast level per its foreground and background color labels and use this as a proxy to study how color contrast affects machine vision. We first conduct a proof-of-concept study, showing the effect of color difference and validate our datasets. Furthermore, on a broader level, an important characteristic of human vision is its robustness against ambient changes; therefore, drawing inspirations from ophthalmology and the robustness literature, we analogize contrast sensitivity from the human visual aspect to machine vision and complement the current robustness study using corrupted images with our CIFAR-CoCo datasets. In summary, motivated by neuroscience and equipped with the datasets we curate, we devise a new framework in two dimensions to perform extensive analyses on the effect of color contrast and corrupted images: (1) model architecture, (2) model size, to measure the perception ability of machine vision beyond total accuracy. We also explore how task complexity and data augmentation play a role in this setup. Our results call attention to new evaluation approaches for human-like machine perception.
translated by 谷歌翻译
It is no secret that deep learning models exhibit undesirable behaviors such as learning spurious correlations instead of learning correct relationships between input/output pairs. Prior works on robustness study datasets that mix low-level features to quantify how spurious correlations affect predictions instead of considering natural semantic factors due to limitations in accessing realistic datasets for comprehensive evaluation. To bridge this gap, in this paper we first investigate how natural background colors play a role as spurious features in image classification tasks by manually splitting the test sets of CIFAR10 and CIFAR100 into subgroups based on the background color of each image. We name our datasets CIFAR10-B and CIFAR100-B. We find that while standard CNNs achieve human-level accuracy, the subgroup performances are not consistent, and the phenomenon remains even after data augmentation (DA). To alleviate this issue, we propose FlowAug, a semantic DA method that leverages the decoupled semantic representations captured by a pre-trained generative flow. Experimental results show that FlowAug achieves more consistent results across subgroups than other types of DA methods on CIFAR10 and CIFAR100. Additionally, it shows better generalization performance. Furthermore, we propose a generic metric for studying model robustness to spurious correlations, where we take a macro average on the weighted standard deviations across different classes. Per our metric, FlowAug demonstrates less reliance on spurious correlations. Although this metric is proposed to study our curated datasets, it applies to all datasets that have subgroups or subclasses. Lastly, aside from less dependence on spurious correlations and better generalization on in-distribution test sets, we also show superior out-of-distribution results on CIFAR10.1 and competitive performances on CIFAR10-C and CIFAR100-C.
translated by 谷歌翻译
This paper introduces the use of evolutionary algorithms for solving differential equations. The solution is obtained by optimizing a deep neural network whose loss function is defined by the residual terms from the differential equations. Recent studies have used stochastic gradient descent (SGD) variants to train these physics-informed neural networks (PINNs), but these methods can struggle to find accurate solutions due to optimization challenges. When solving differential equations, it is important to find the globally optimum parameters of the network, rather than just finding a solution that works well during training. SGD only searches along a single gradient direction, so it may not be the best approach for training PINNs with their accompanying complex optimization landscapes. In contrast, evolutionary algorithms perform a parallel exploration of different solutions in order to avoid getting stuck in local optima and can potentially find more accurate solutions. However, evolutionary algorithms can be slow, which can make them difficult to use in practice. To address this, we provide a set of five benchmark problems with associated performance metrics and baseline results to support the development of evolutionary algorithms for enhanced PINN training. As a baseline, we evaluate the performance and speed of using the widely adopted Covariance Matrix Adaptation Evolution Strategy (CMA-ES) for solving PINNs. We provide the loss and training time for CMA-ES run on TensorFlow, and CMA-ES and SGD run on JAX (with GPU acceleration) for the five benchmark problems. Our results show that JAX-accelerated evolutionary algorithms, particularly CMA-ES, can be a useful approach for solving differential equations. We hope that our work will support the exploration and development of alternative optimization algorithms for the complex task of optimizing PINNs.
translated by 谷歌翻译
To rigorously certify the robustness of neural networks to adversarial perturbations, most state-of-the-art techniques rely on a triangle-shaped linear programming (LP) relaxation of the ReLU activation. While the LP relaxation is exact for a single neuron, recent results suggest that it faces an inherent "convex relaxation barrier" as additional activations are added, and as the attack budget is increased. In this paper, we propose a nonconvex relaxation for the ReLU relaxation, based on a low-rank restriction of a semidefinite programming (SDP) relaxation. We show that the nonconvex relaxation has a similar complexity to the LP relaxation, but enjoys improved tightness that is comparable to the much more expensive SDP relaxation. Despite nonconvexity, we prove that the verification problem satisfies constraint qualification, and therefore a Riemannian staircase approach is guaranteed to compute a near-globally optimal solution in polynomial time. Our experiments provide evidence that our nonconvex relaxation almost completely overcome the "convex relaxation barrier" faced by the LP relaxation.
translated by 谷歌翻译
Causal phenomena associated with rare events frequently occur across a wide range of engineering and mathematical problems, such as risk-sensitive safety analysis, accident analysis and prevention, and extreme value theory. However, current methods for causal discovery are often unable to uncover causal links between random variables that manifest only when the variables first experience low-probability realizations. To address this issue, we introduce a novel algorithm that performs statistical independence tests on data collected from time-invariant dynamical systems in which rare but consequential events occur. We seek to understand if the state of the dynamical system causally affects the likelihood of the rare event. In particular, we exploit the time-invariance of the underlying data to superimpose the occurrences of rare events, thus creating a new dataset, with rare events are better represented, on which conditional independence tests can be more efficiently performed. We provide non-asymptotic bounds for the consistency of our algorithm, and validate the performance of our algorithm across various simulated scenarios, with applications to traffic accidents.
translated by 谷歌翻译
Universal Image Segmentation is not a new concept. Past attempts to unify image segmentation in the last decades include scene parsing, panoptic segmentation, and, more recently, new panoptic architectures. However, such panoptic architectures do not truly unify image segmentation because they need to be trained individually on the semantic, instance, or panoptic segmentation to achieve the best performance. Ideally, a truly universal framework should be trained only once and achieve SOTA performance across all three image segmentation tasks. To that end, we propose OneFormer, a universal image segmentation framework that unifies segmentation with a multi-task train-once design. We first propose a task-conditioned joint training strategy that enables training on ground truths of each domain (semantic, instance, and panoptic segmentation) within a single multi-task training process. Secondly, we introduce a task token to condition our model on the task at hand, making our model task-dynamic to support multi-task training and inference. Thirdly, we propose using a query-text contrastive loss during training to establish better inter-task and inter-class distinctions. Notably, our single OneFormer model outperforms specialized Mask2Former models across all three segmentation tasks on ADE20k, CityScapes, and COCO, despite the latter being trained on each of the three tasks individually with three times the resources. With new ConvNeXt and DiNAT backbones, we observe even more performance improvement. We believe OneFormer is a significant step towards making image segmentation more universal and accessible. To support further research, we open-source our code and models at https://github.com/SHI-Labs/OneFormer
translated by 谷歌翻译
Online continual learning (OCL) aims to enable model learning from a non-stationary data stream to continuously acquire new knowledge as well as retain the learnt one, under the constraints of having limited system size and computational cost, in which the main challenge comes from the "catastrophic forgetting" issue -- the inability to well remember the learnt knowledge while learning the new ones. With the specific focus on the class-incremental OCL scenario, i.e. OCL for classification, the recent advance incorporates the contrastive learning technique for learning more generalised feature representation to achieve the state-of-the-art performance but is still unable to fully resolve the catastrophic forgetting. In this paper, we follow the strategy of adopting contrastive learning but further introduce the semantically distinct augmentation technique, in which it leverages strong augmentation to generate more data samples, and we show that considering these samples semantically different from their original classes (thus being related to the out-of-distribution samples) in the contrastive learning mechanism contributes to alleviate forgetting and facilitate model stability. Moreover, in addition to contrastive learning, the typical classification mechanism and objective (i.e. softmax classifier and cross-entropy loss) are included in our model design for faster convergence and utilising the label information, but particularly equipped with a sampling strategy to tackle the tendency of favouring the new classes (i.e. model bias towards the recently learnt classes). Upon conducting extensive experiments on CIFAR-10, CIFAR-100, and Mini-Imagenet datasets, our proposed method is shown to achieve superior performance against various baselines.
translated by 谷歌翻译
对于单眼深度估计,获取真实数据的地面真相并不容易,因此通常使用监督的合成数据采用域适应方法。但是,由于缺乏实际数据的监督,这仍然可能会导致较大的域间隙。在本文中,我们通过从真实数据中生成可靠的伪基础真理来开发一个域适应框架,以提供直接的监督。具体而言,我们提出了两种用于伪标记的机制:1)通过测量图像具有相同内容但不同样式的深度预测的一致性,通过测量深度预测的一致性; 2)通过点云完成网络的3D感知伪标记,该网络学会完成3D空间中的深度值,从而在场景中提供更多的结构信息,以完善并生成更可靠的伪标签。在实验中,我们表明我们的伪标记方法改善了各种环境中的深度估计,包括在训练过程中使用立体声对。此外,该提出的方法对现实世界数据集中的几种最新无监督域的适应方法表现出色。
translated by 谷歌翻译